52 research outputs found

    Exploitation de corpus parallèles et comparables pour la détection de correspondances lexicales (application au domaine médical)

    No full text
    Dans ce travail, nous cherchons à mettre des propriétés des corpus textuels (parallélisme et comparabilité) à profit pour l'Informatique Médicale, en détectant des correspondances lexicales de deux types: des traductions de termes médicaux afin d'enrichir des terminologies; des paraphrases d'expressions spécialisées et grand public dans le but d'aider à rédiger des documents grand public. Une première expérience se base sur des approches éprouvées et un corpus parallèle, et met en place des méthodes d'alignement de corpus. Ceci nous a permis d'obtenir de nouvelles traductions françaises de termes anglais, dont certaines sont maintenant intégrées au thésaurus MeSH. Une seconde expérience examine les possibilités d'exploitation de corpus comparables monolingues. Deux méthodes ont été conçues: une première recherche des paraphrases de nominalisations; la deuxième des paraphrases de composés savants. Diverses paraphrases semblant cohérentes avec l'opposition spécialisé/grand public étudiée ont été obtenues.PARIS-BIUSJ-Mathématiques rech (751052111) / SudocSudocFranceF

    Text-mining tools for extracting information about microbial biodiversity in food

    No full text
    Information on food microbial diversity is scattered across millions of scientific papers. Researchers need tools to assist their bibliographic search in such large collections. Text mining and knowledge engineering methods are usefu l to automatically and efficiently find relevant information in Life Science. This work describes how the Alvis text mining platform has been applied to a large collection of PubMed abstracts of scientific papers in the food microbiology domain. The information targeted by our work is microorganisms, their habitats and phenotypes. Two knowledge resources, the NCBI taxonomy and the OntoBiotope ontology were used to detect this information in texts. The result of the text mining process was indexed and is presented through the AlvisIR Food on-line semantic search engine. In this paper, we also show through two illustrative examples the great potential of this new tool to assist in studies on ecological diversity and the origin of microbial presence in food

    Participation de l’équipe LAI à DEFT 2019

    No full text
    International audienceNous présentons dans cet article les méthodes conçues et les résultats obtenus lors de notre participation à la tâche 3 de la campagne d'évaluation DEFT 2019. Nous avons utilisé des approches simples à base de règles ou d'apprentissage automatique, et si nos résultats sont très bons sur les informations simples à extraire comme l'âge et le sexe du patient, ils restent mitigés sur les tâches plus difficiles. ABSTRACT Participation of team LAI in the DEFT 2019 challenge We present in this article the methods developed and the results obtained during our participation in task 3 of the DEFT 2019 evaluation campaign. We used simple rule-based or machine-learning approaches ; our results are very good on the information that is simple to extract (age, gender), they remain mixed on the more difficult tasks

    Defining Medical Words : Transposing Morphosemantic Analysis from French to English

    No full text
    MEDINFOInternational audienceMedical language, as many technical languages, is rich with morphologically complex words, many of which take their roots in Greek and Latin—in which case they are called neo-classical compounds . Morphosemantic analysis can help generate definitions of such words. This paper reports work on the adaptation of a morphosemantic analyzer dedicated to French (DériF) to analyze English medical neoclassical com-pounds. It presents the principles of this transposition and its current performance. The analyzer was tested on a set of 1,299 compounds extracted from the WHO-ART terminology. 859 could be decomposed and defined, 675 of which success-fully. An advantage of this process is that complex linguistic analyses designed for French could be successfully trans-ferred to the analysis of English medical neoclassical com-pounds. Moreover, the resulting system can produce more complete analyses of English medical compounds than exist-ing ones, including a hierarchical decomposition and seman-tic gloss of each wor

    Morphosemantic parsing of medical compound words: Transferring a French analyzer to English

    No full text
    International audienceMedical language, as many technical languages, is rich with morphologically complex words, many of which take their roots in Greek and Latin - in which case they are called neoclassical compounds. Morphosemantic analysis can help generate definitions of such words. The similarity of structure of those compounds in several Europeanlanguages has also been observed, which seems to indicate that a same linguistic analisys could be applied to neo-classical compounds from different languages with minor modifications. This paper reports work on the adaptation of a morphosemantic analyser dedicated to French (DériF) to analyse English neo-classical compounds. It presents the principles of this transposition and its current performance

    Text mining tools for extracting information about microbial biodiversity in food

    No full text
    Introduction Information on food microbial biodiversity is scattered across millions of scientific papers (2 million references in the PubMed bibliographic database in 2017). It is impossible to manually achieve an exhaustive analysis of these documents. Text-mining and knowledge engineering methods can assist the researcher in finding relevant information. Material & MethodsWe propose to study bacterial biodiversity using text-mining tools from the Alvis platform. First, we analyzed terms that designate Microbial and Habitat entities in text. Microorganism names were predicted using the NCBI taxonomy. Habitat entities were detected using the syntactic structure of the terms and the OntoBiotope ontology. This ontology has been specifically enriched for the recognition of food terms in text. In a second time, we predicted links between microorganisms and their habitats (labeled “Lives_in” relationships) using pattern and machine-learning based methods. The results of text-mining predictions are indexed and presented in a semantic search engine. Result The AlvisIR search engine for microbe literature gives online access to 1.2 million PubMed abstracts in 2015, among which 13% are specific to food. This tool makes it possible to use text-mining results to search for information on bacterial biodiversity. It covers all types of microbial habitats to help understand the origin of microbial presence in food. Significance This work presents the first semantic search engine dedicated to better understand microbial food biodiversity from text

    Text-mining needs of the food microbiology research community

    No full text
    To ensure the usefulness of a bioinformatics service, analysis of user needs is an essential step. Furthermore, if the service anticipates the identified needs, acceptance by the user is easier. The aim of this work is to provide an overview of the requirements of a microbial diversity research community for ontology-based text-mining applications.This study is part of the development of the European infrastructure for text-mining, OpenMinTeD, that targets Biodiversity among other research fields. The requirement analysis was completed through targeted online surveys, interviews, focus group meetings and workshops. This work yields to a detailed up-to-date landscape of stakeholders (data provider, producer and consumer), their potential role and their expectations of general interest with respect to text-mining applications. We introduce a user-centered approach to focus on microbiologist end-user functional requirements, including application user interfaces. The resulting description of these needs guides OpenMinTeD current development to design and develop activities within text-mining projects for microbiology community

    Text-mining tools for extracting information about microbial biodiversity in food

    No full text
    Article in pressInternational audienceInformation on food microbial diversity is scattered across millions of scientific papers. Researchers need tools to assist their bibliographic search in such large collections. Text mining and knowledge engineering methods are usefu l to automatically and efficiently find relevant information in Life Science. This work describes how the Alvis text mining platform has been applied to a large collection of PubMed abstracts of scientific papers in the food microbiology domain. The information targeted by our work is microorganisms, their habitats and phenotypes. Two knowledge resources, the NCBI taxonomy and the OntoBiotope ontology were used to detect this information in texts. The result of the text mining process was indexed and is presented through the AlvisIR Food on-line semantic search engine. In this paper, we also show through two illustrative examples the great potential of this new tool to assist in studies on ecological diversity and the origin of microbial presence in food

    Text-mining and ontologies: new approaches to knowledge discovery of microbial diversity

    No full text
    International audienceMicrobiology research has access to a very large amount of public information on the habitats of microorganisms. Many areas of microbiology research uses this information, primarily in biodiversity studies. However the habitat information is expressed in unstructured natural language form, which hinders its exploitation at large-scale. It is very common for similar habitats to be described by different terms, which makes them hard to compare automatically, e.g. intestine and gut. The use of a common reference to standardize these habitat descriptions as claimed by (Ivana et al., 2010) is a necessity. We propose the ontology called OntoBiotope that we have been developing since 2010. The OntoBiotope ontology is in a formal machinereadable representation that enables indexing of information as well as conceptualization and reasoning
    corecore